这项工作研究了彩色的任务,其中目的是将聋人(听力态度)社区转录到聋人的自然口语句子,以命令手语界面。以配对句子 - 光泽数据培训的先前序列到序列语言模型通常无法捕获两个不同语言之间的丰富连接,从而导致不满意的转录。我们观察到,尽管语法不同,但有效地简化了聋人通信的句子,同时与句子分享大部分词汇。这使我们能够通过执行编辑动作的集合来实现有乐化性的。单词添加,删除和复制,称为编辑程序,在他们的自然语言同行上。具体而言,我们设计了一种新的神经代理,了解综合和执行编辑程序,在句子上下文和部分编辑结果上调节的编辑程序。经过培训的代理以模仿最小的编辑程序,同时通过策略梯度更广泛地探索节目空间,以优化序列明智的转录质量。结果表明,我们的方法优于先前的光泽模型。
translated by 谷歌翻译
联合学习通过融合来自本地节点的协作模型来从分散的数据中学习。然而,FedAVG平均的传统基于坐标的模型忽略了每个参数编码的随机信息,并且可能遭受结构特征未对准。在这项工作中,我们提出了Fed2,一个功能对齐的联合学习框架来解决这个问题,通过在协作模型上建立一个坚定的结构特征对齐来解决这个问题。 FED2由两种主要设计组成:首先,我们设计了一个面向功能的模型结构适应方法,以确保不同神经网络结构中的显式功能分配。将结构适应应用于协作模型,可以在非常早期的训练阶段初始化具有类似特征信息的匹配结构。在联合学习过程中,我们提出了一个特征配对的平均方案,以保证对齐的特征分布,并在IID或非IID方案下维护没有特征融合冲突。最终,FED2可以在广泛的同源和异构环境下有效地提高联合学习收敛性能,提供出色的收敛速度,准确性和计算/通信效率。
translated by 谷歌翻译
深度学习(DL)模型在许多应用领域中取得了卓越的性能,包括愿景,语言,医疗,商业广告,娱乐等。随着快速的发展,DL应用和潜在的服务硬件都表现出强大的缩放趋势,即例如,模型缩放和计算缩放,例如,最近的预先训练模型,具有数百亿次参数,具有〜TB级存储器消耗,以及提供数百个TFLOPS的最新GPU加速器。在扩大趋势,新的问题和挑战中出现了DL推理服务系统,这逐渐朝着大型深度学习服务系统(LDS)趋势。该调查旨在总结和分类大规模深度学习服务系统的新兴挑战和优化机会。通过提供新的分类法,总结计算范例,并详细说明最近的技术进步,我们希望这项调查能够在新的优化视角下阐明,并激励小说在大型深度学习系统优化中的作品。
translated by 谷歌翻译
This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage. We review the training process and attribute the overlooked phenomenon to two limitations: lack of training emphasis and cascading errors from decoding sequence. We design and present Selective Query Recollection (SQR), a simple and effective training strategy for query-based object detectors. It cumulatively collects intermediate queries as decoding stages go deeper and selectively forwards the queries to the downstream stages aside from the sequential structure. Such-wise, SQR places training emphasis on later stages and allows later stages to work with intermediate queries from earlier stages directly. SQR can be easily plugged into various query-based object detectors and significantly enhances their performance while leaving the inference pipeline unchanged. As a result, we apply SQR on Adamixer, DAB-DETR, and Deformable-DETR across various settings (backbone, number of queries, schedule) and consistently brings 1.4-2.8 AP improvement.
translated by 谷歌翻译
Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community. Recent research efforts have quickly led to the design of novel algorithms able to reduce the impact of the catastrophic forgetting phenomenon in deep neural networks. Due to this surge of interest in the field, many competitions have been held in recent years, as they are an excellent opportunity to stimulate research in promising directions. This paper summarizes the ideas, design choices, rules, and results of the challenge held at the 3rd Continual Learning in Computer Vision (CLVision) Workshop at CVPR 2022. The focus of this competition is the complex continual object detection task, which is still underexplored in literature compared to classification tasks. The challenge is based on the challenge version of the novel EgoObjects dataset, a large-scale egocentric object dataset explicitly designed to benchmark continual learning algorithms for egocentric category-/instance-level object understanding, which covers more than 1k unique main objects and 250+ categories in around 100k video frames.
translated by 谷歌翻译
With the development of deep learning and Transformer-based pre-trained models like BERT, the accuracy of many NLP tasks has been dramatically improved. However, the large number of parameters and computations also pose challenges for their deployment. For instance, using BERT can improve the predictions in the financial sentiment analysis (FSA) task but slow it down, where speed and accuracy are equally important in terms of profits. To address these issues, we first propose an efficient and lightweight BERT (ELBERT) along with a novel confidence-window-based (CWB) early exit mechanism. Based on ELBERT, an innovative method to accelerate text processing on the GPU platform is developed, solving the difficult problem of making the early exit mechanism work more effectively with a large input batch size. Afterward, a fast and high-accuracy FSA system is built. Experimental results show that the proposed CWB early exit mechanism achieves significantly higher accuracy than existing early exit methods on BERT under the same computation cost. By using this acceleration method, our FSA system can boost the processing speed by nearly 40 times to over 1000 texts per second with sufficient accuracy, which is nearly twice as fast as FastBERT, thus providing a more powerful text processing capability for modern trading systems.
translated by 谷歌翻译
在本文中,我们提出了一种识别相同商品的方法。在电子商务方案中,通常通过图像和文本来描述商品。根据定义,相同的商品是具有相同关键属性并且认知与消费者相同的商品。有两个主要挑战:1)多模式表示的提取和融合。 2)通过比较阈值的表示之间的距离来验证两种商品是否相同的能力。为了解决上述问题,我们提出了一种基于自适应阈值的端到端相同的商品验证方法。我们使用双流网络分别提取商品嵌入和阈值嵌入,然后将它们串联以获得商品表示。我们的方法能够根据不同的商品获得不同的阈值,同时保持整个商品表示的索引性。我们在实验中验证了多模式特征融合的有效性和自适应阈值的优势。此外,我们的方法达到了0.8936的F1分数,并在排行榜上排名第三,完成了CCKS-2022知识图评估数字商务竞赛的第二项任务。代码和预估计的模型可在https://github.com/hanchenchen/ccks2022-track2-solution上找到。
translated by 谷歌翻译
在本文中,我们研究了为给定图像生成高质量视觉文本演示设计的图形布局生成问题。我们注意到,不仅包含全局语义和空间信息的图像组成在很大程度上会影响布局结果。因此,我们提出了一个深层生成模型,称为组成感知图形布局GAN(CGL-GAN),以基于输入图像的全局和空间视觉内容来合成布局。为了从已经包含手动设计的图形布局数据的图像中获取训练图像,先前的工作建议将设计元素(例如文本和点缀)作为模型输入,这不可避免地会留下地面真相的提示。我们研究训练输入(带有提示掩码)和测试输入(没有掩模)之间的错位,并设计一个新型的域比对模块(DAM)以缩小此间隙。为了培训,我们构建了一个大规模布局数据集,该数据集由60,548张广告海报组成,并带有带注释的布局信息。为了评估生成的布局,我们根据美学直觉提出了三个新型指标。通过定量和定性评估,我们证明了所提出的模型可以根据图像组成合成高质量的图形布局。
translated by 谷歌翻译
为了在商店中充分利用计算机视觉技术,需要考虑适合零售场景特征的实际需求。为了实现这一目标,我们介绍了联合零售数据集(Unitail),这是针对检测,阅读和匹配算法的产品的基本视觉任务的大规模基准。凭借注释的180万个四边形实例,该Unitail提供了一个检测数据集,以更好地对齐产品外观。此外,它提供了一个包含1454个产品类别,30k文本区域和21k转录的画廊风格的OCR数据集,以实现对产品的强大阅读并激励增强的产品匹配。除了使用各种最新技术对数据集进行基准测试外,我们还定制了一个新的检测器以进行产品检测,并提供了一个简单的基于OCR的匹配解决方案,以验证其有效性。
translated by 谷歌翻译
本文介绍了WenetsPeech,一个由10000多小时的高质量标记语音组成的多域普通话语料库,2400多小时弱贴言论,大约100万小时的语音,总共22400多小时。我们收集来自YouTube和Podcast的数据,涵盖各种演讲样式,场景,域名,主题和嘈杂的条件。引入了基于光学字符识别(OCR)的方法,以在其对应的视频字幕上为YouTube数据生成音频/文本分段候选,而高质量的ASR转录系统用于为播客数据生成音频/文本对候选。然后我们提出了一种新的端到端标签错误检测方法,可以进一步验证和过滤候选者。我们还提供三个手动标记的高质量测试集,以及WenetsPeech进行评估 - 开发用于训练中的交叉验证目的,从互联网收集的匹配测试,并从真实会议中记录的测试\ _MEETING,以获得更具挑战性的不匹配测试。使用有线exeeEX培训的基线系统,用于三个流行的语音识别工具包,即Kaldi,Espnet和Wenet,以及三个测试集的识别结果也被提供为基准。据我们所知,WenetsPeech是目前最大的开放式普通话语音语料库,其中有利于生产级语音识别的研究。
translated by 谷歌翻译